19 research outputs found
Artificial and Natural Topic Detection in Online Social Networks
Online Social Networks (OSNs), such as Twitter, offer attractive means of social interactions and communications, but also raise privacy and security issues. The OSNs provide valuable information to marketing and competitiveness based on users posts and opinions stored inside a huge volume of data from several themes, topics, and subjects. In order to mining the topics discussed on an OSN we present a novel application of Louvain method for TopicModeling based on communities detection in graphs by modularity. The proposed approach succeeded in finding topics in five different datasets composed of textual content from Twitter and Youtube. Another important contribution achieved was about the presence of texts posted by spammers. In this case, a particular behavior observed by graph community architecture (density and degree) allows the indication of a topic strength and the classification of it as natural or artificial. The later created by the spammers on OSNs
Recognition on Online Social Network by user's writing style
Compromising legitimate accounts is the most popular way of disseminating fraudulent content in Online Social Networks (OSN). To address this issue, we propose an approach for recognition of compromised Twitter accounts based on Authorship Verification. Our solution can detect accounts that became compromised by analysing their user writing styles. This way, when an account content does not match its user writing style, we affirm that the account has been compromised, similar to Authorship Verification. Our approach follows the profile-based paradigm and uses N-grams as its kernel. Then, a threshold is found to represent the boundary of an account writing style. Experiments were performed using two subsampled datasets from Twitter. Experimental results showed the developed model is very suitable for compromised recognition of Online Social Networks accounts due to the capacity of recognizing user styles over 95% accuracy for both datasets
Comparing Concept Drift Detection with Process Mining Software
Organisations have seen a rise in the volume of data corresponding to business processes being recorded. Handling process data is a meaningful way to extract relevant information from business processes with impact on the company's values. Nonetheless, business processes are subject to changes during their executions, adding complexity to their analysis. This paper aims at evaluating currently available process mining tools and software that handle concept drifts, i.e. changes over time of the statistical properties of the events occurring in a process. We provide an in-depth analysis of these tools, comparing their differences, advantages, and disadvantages by testing against a log taken from a Process Control System. Thus, by highlighting the trade-off between the software, the paper gives the stakeholders the best options regarding their case use
Robust computer vision system for marbling meat segmentation
In this study, we developed a robust automatic computer vision system for marbling meat segmentation. Our approach can segment muscle fat in various marbled meat samples using images acquired with different quality devices in an uncontrolled environment, where there was external ambient light and artificial light; thus, professionals can apply this method without specialized knowledge in terms of sample treatments or equipment, as well as without disruption to normal procedures, thereby obtaining a robust solution. The proposed approach for marbling segmentation is based on data clustering and dynamic thresholding. Experiments were performed using two datasets that comprised 82 images of 41 longissimus dorsi muscles acquired by different sampling devices. The experimental results showed that the computer vision system performed well with over 98% accuracy and a low number of false positives, regardless of the acquisition device employed
Recognition of Compromised Accounts on Twitter
In this work, we propose an approach for recognition of compromised Twitter accounts based on Authorship Verification. Our solution can detect accounts that became compromised by analysing their user writing styles. This way, when an account content does not match its user writing style, we affirm that the account has been compromised, similar to Authorship Verification. Our approach follows the profile-based paradigm and uses N-grams as its kernel. Then, a threshold is found to represent the boundary of an account writing style. Experiments were performed using a subsampled dataset from Twitter. Experimental results showed that the developed model is very suitable for compromised recognition of Online Social Networks accounts due to the capability of recognize user styles over 95% accuracy
M-Health Solution on pre-diagnosis of larynx
The study of new approaches that seek to improve the diagnosis of pathologies in the vocal folds process, is one of the main motivators for health research based on voice. Not only by the creation of new techniques, but in the use of already existing technologies using new approaches, such as the mobile technologies in fields they were never fully explored, or barely explored at all. This article is intended to further increase the development of the m-health’s field, specifically the early diagnosis of the vocal fold’s diseases through analysis of the fundamental frequency of the speaker’s vocalization improving the support in medical decision. The result shows 95% with a minimum of 10hz difference
Stock Portfolio Prediction by Multi-Target Decision Support
Investing in the stock market is a complex process due to its high volatility caused by factors as exchange rates, political events, inflation and the market history. To support investor's decisions, the prediction of future stock price and economic metrics is valuable. With the hypothesis that there is a relation among investment performance indicators, the goal of this paper was exploring multi-target regression (MTR) methods to estimate 6 different indicators and finding out the method that would best suit in an automated prediction tool for decision support regarding predictive performance. The experiments were based on 4 datasets, corresponding to 4 different time periods, composed of 63 combinations of weights of stock-picking concepts each, simulated in the US stock market. We compared traditional machine learning approaches with seven state-of-the-art MTR solutions: Stacked Single Target, Ensemble of Regressor Chains, Deep Structure for Tracking Asynchronous Regressor Stacking,  Deep Regressor Stacking, Multi-output Tree Chaining, Multi-target Augment Stacking and Multi-output Random Forest (MORF). With the exception of MORF, traditional approaches and the MTR methods were evaluated with Extreme Gradient Boosting, Random Forest and Support Vector Machine regressors. By means of extensive experimental evaluation, our results showed that the most recent MTR solutions can achieve suitable predictive performance, improving all the scenarios (14.70% in the best one, considering all target variables and periods). In this sense, MTR is a proper strategy for building stock market decision support system based on prediction models
Towards meta-learning for multi-target regression problems
Several multi-target regression methods were devel-oped in the last years
aiming at improving predictive performanceby exploring inter-target correlation
within the problem. However, none of these methods outperforms the others for
all problems. This motivates the development of automatic approachesto
recommend the most suitable multi-target regression method. In this paper, we
propose a meta-learning system to recommend the best predictive method for a
given multi-target regression problem. We performed experiments with a
meta-dataset generated by a total of 648 synthetic datasets. These datasets
were created to explore distinct inter-targets characteristics toward
recommending the most promising method. In experiments, we evaluated four
different algorithms with different biases as meta-learners. Our meta-dataset
is composed of 58 meta-features, based on: statistical information, correlation
characteristics, linear landmarking, from the distribution and smoothness of
the data, and has four different meta-labels. Results showed that induced
meta-models were able to recommend the best methodfor different base level
datasets with a balanced accuracy superior to 70% using a Random Forest
meta-model, which statistically outperformed the meta-learning baselines.Comment: To appear on the 8th Brazilian Conference on Intelligent Systems
(BRACIS